Semiparametric Sparse Discriminant Analysis in Ultra-High Dimensions
نویسندگان
چکیده
In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten & Tibshirani 2011, Cai & Liu 2011, Mai et al. 2012, Fan et al. 2012). In this paper, we develop high-dimensional semiparametric sparse discriminant analysis (HD-SeSDA) that generalizes the normal-theory discriminant analysis in two ways: it relaxes the Gaussian assumptions and can handle non-polynomial (NP) dimension classification problems. If the underlying Bayes rule is sparse, HD-SeSDA can estimate the Bayes rule and select the true features simultaneously with overwhelming probability, as long as the logarithm of dimension grows slower than the cube root of sample size. Simulated and real examples are used to demonstrate the finite sample performance of HD-SeSDA. At the core of the theory is a new exponential concentration bound for semiparametric Gaussian copulas, which is of independent interest.
منابع مشابه
Sparse semiparametric discriminant analysis
In recent years, a considerable amount of work has been devoted to generalizing linear discriminant analysis to overcome its incompetence for high-dimensional classification (Witten and Tibshirani, 2011, Cai and Liu, 2011, Mai et al., 2012 and Fan et al., 2012). In this paper, we develop high-dimensional sparse semiparametric discriminant analysis (SSDA) that generalizes the normal-theory discr...
متن کاملA direct approach to sparse discriminant analysis in ultra-high dimensions
Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among fea...
متن کاملEfficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures
We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for le...
متن کاملDiscriminant Analysis through a Semiparametric Model
We consider a semiparametric generalisation of normal-theory discriminant analysis. The semiparametric model assumes that, after unspecified univariate monotone transformations, the class distributions are multivariate normal. We introduce an estimation procedure based on the distribution quantiles, in which the parameters of the semiparametric model are estimated directly without estimating th...
متن کاملSemiparametric regression models with additive nonparametric components and high dimensional parametric components
This paper concerns semiparametric regression models with additive nonparametric components and high dimensional parametric components under sparsity assumptions. To achieve simultaneous model selection for both nonparametric and parametric parts, we introduce a penalty that combines the adaptive empirical L2-norms of the nonparametric component functions and the SCAD penalty on the coefficient...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013